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Abstract 



Given an undirected graph G and an error parameter £ > 0, the graph sparsification problem requires 
sampling edges in G and giving the sampled edges appropriate weights to obtain a sparse graph Gg with 
the following property: the weight of every cut in Gg is within a factor of (1 ± e) of the weight of 
the corresponding cut in G. If G is unweighted, an (9(mlog«)-time algorithm for constructing Gg with 
(9(nlog«/e^) edges in expectation, and an(9(m)-time algorithm for constructing Gg with 0{n\og^n/e^) 
ly^ I edges in expectation have recently been developed [9|. In this paper, we improve these results by giving 

an (9(m)-time algorithm for constructing Ge with 0{n\ogn/e^) edges in expectation, for unweighted 
graphs. Our algorithm is optimal in terms of its time complexity; further, no efficient algorithm is known 
for constructing a sparser Ge. Our algorithm is Monte-Carlo, i.e. it produces the correct output with high 
probability, as are all efficient graph sparsification algorithms. 



1 Introduction 

A cut of an undirected graph is a partition of its vertices into two disjoint sets. The weight of a cut is the sum 
of weights of the edges crossing the cut, i.e. edges having one endpoint each in the two vertex subsets of 
the partition. For unweighted graphs, each edge is assumed to have unit weight. Cuts play an important role 
in many problems in graphs: e.g., the maximum flow between a pair of vertices is equal to the minimum 
weight cut separating them. 

A skeleton G' of an undirected graph G is a subgraph of G on the same set of vertices where each edge 
in G' can have an arbitrary weight. The problem of finding an appropriately weighted sparse skeleton for an 
undirected graph G that approximately preserves the weights of all cuts in G was introduced and studied by 
Karger et al in a series of results ifTTl [T2l |3l culminating in the following theorem. Throughout this paper, 
for any undirected graph G and any e G (0, 1], (1 it e)G will denote the set of all appropriately weighted 
subgraphs of G where the weight of every cut in the subgraph is within a factor of (1 it e) of the weight of 
the corresponding cut in G. 

Theorem 1 (Bencziir- Karger [31). For any undirected graph G with m edges and n vertices, and for any 
error parameter e G (0, 1], a skeleton Gg containing 0{ " °f " ) edges in expectation such that Gg G (1 it e)G 
with high probability^ can be found in 0{m\og n) time ifG is unweighted and 0{m\og n) time otherwise. 

Besides its combinatorial ramifications, the importance of this result stems from its use as a pre-processing 
step in several graph algorithms, e.g. to obtain an 0{n^l^ + m)-time algorithm for approximate maximum 
flow using the (5(m^'^)-time algorithm for exact maxflow due to Goldberg and Rao [7|; and more recently, 
0{n^/^ + /M)-time algorithms for approximate sparsest cut |[T3l[T9l . 

Subsequent to Bencziir and Karger's work, Spielman and Teng 11211 \T2\ extended their results to pre- 
serving all quadratic forms, of which cuts are a special case; however, the size of the skeleton constructed 
was 0{n\og'^n) for some large constant c. Spielman and Srivastava |20| improved this result by construct- 
ing skeletons of size 0{ " °f" ) in 0{m\og^^^' n) time, while continuing to preserve all quadratic forms. 
Recently, this result was further improved by Batson et al [T| who gave a deterministic algorithm for con- 
structing skeletons of size 0{jt) that preserve the weights of all cuts whp. While their result is optimal in 
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terms of the size of the skeleton constructed, the time complexity of their algorithm is O(^), rendering it 
somewhat useless in terms of applications. 

Recently, further progress has been made on efficiently constructing a skeleton graph in the form of the 
following theorem due to Hariharan and Panigrahi |j9||. 

Theorem 2 (Hariharan-Panigrahi |91). For an undirected graph G with m edges and n vertices, and for any 
error parameter £ G (0,1], the following algorithmic results can be obtained for constructing a skeleton 
graph Gg that is in (1 it £)G with high probability: 

• If the expected number of edges in Gg is 0{nlog^n/£^), then Gg can be constructed in 0{m) time if 
G has polynomial edge weights and 0{m\og n) time ifG has arbitrary edge weights. 

• If the expected number of edges in Gg is 0{n\ogn/£^), then Gg can be constructed in 0{m\ogn) time 
ifG is unweighted, and 0{m\og n) time ifG has polynomial edge weights. 



We say that a property holds with high probability (or whp) for a graph on n vertices if its failure probability can be bounded 
by the inverse of a fixed polynomial in n. 



Combining the above two results, one can obtain an algorithm to construct a skeleton graph Gg that preserves 
the weights of all cuts whp and has 0(nlog?i/£^) edges in expectation in 0{m + nlog^n/e^) time if G has 
polynomial edge weights. 

A natural conclusion for this Une of work would be to obtain an (9 (m) -time algorithm for constructing 
a skeleton graph Gg containing 0{n\ogn/e-^) edges in expectation. In this correspondence, we obtain this 
result in the form of the following theorem if G is unweighted. It may be noted that even if G is unweighted, 
the best sparsification result known previously was Theorem |2l 

Theorem 3. For an undirected unweighted graph G with m edges and n vertices, and for any error param- 
eter £ G (0, 1], a skeleton graph Gg that is in (1 it e)G with high probability and has 0{n\og n/e^) edges 
in expectation, can be constructed in 0{m) time. 

Note that the above algorithm is optimal in terms of its running time, and there is no efficient algorithm 
known for constructing a sparser skeleton, even for unweighted graphs. (As mentioned previously, the only 
algorithm known that constructs a sparser skeleton has a time complexity of 0{n^m/£^) |[2l.) 

Before describing our algorithm in more detail, it is worth mentioning some of the related research in 
graph sparsification. In a recent result, Fung and Harvey [5 1 show that sampling uniformly random spanning 
trees of a graph produces good sparsifiers. This approach was used previously to obtain coarser sparsifiers 
by Goyal et al |8]. Fung and Harvey also show that sampling edges according to their standard connectivities 
also produces good sparsifiers, a result obtained independently by Hariharan and Panigrahi 191. The problem 
of graph sparsification in the semi-streaming model was first considered by Ahn and Guha f\\ who gave a 
one-pass algorithm for constructing a skeleton Gg containing 0{n\ogn\og{m / n) / e^) edges. Recently, Goel 
et al have given the following algorithms for this problem ||6l : 

• An 0(mlog log M)-time one-pass algorithm for constructing a skeleton graph Gg containing 0{n\og^n/£^) 
edges in expectation. The size of the skeleton can be improved to 0{n\ogn/£^) edges; however, the 
time complexity of the algorithm then becomes 0(mloglog?i + nlog^ n/e^). 

• An 0(m)-time two-pass algorithm for constructing a skeleton graph Gg containing 0{n\ogn/£^) 
edges in expectation, li m = Q.{n^^^) for some constant 5 > 0. 

Observe that both results, if applied to a non-streaming model, are weaker than Theorem |2l Another area 
of recent interest, though not directly related to our problem, is that of vertex sparsification lfT5l [r4l. Given 
a graph G = iy,E) and a subset of vertices S dV , the goal here is to create a graph Gs = {S,Es) that 
approximately preserves some desired connectivity property of G (e.g. minimum steiner cut lITSll . maximum 
multi-commodity flow iTHl ). 

1.1 Our Techniques 

All previous algorithms for graph sparsification have two phases: in the first phase, a suitable probability 
Pe for sampling each edge e is determined; and then, in the second phase, every edge is independently 
sampled with probability pe and given weight l/p^ in the skeleton graph if selected in the sample. Our main 
technical novelty is in interleaving the sampling process with that of estimating sampling probabilities. Such 
interleaving leads to several technical hurdles: 

• It introduces dependence between the sampling of different edges. Such dependence has appeared 
previously in sparsification algorithms for the semi-streaming model, but the nature of the dependence 
in our algorithm is somewhat different from that in the streaming algorithms. 



• An edge may now be sampled multiple times, and errors are accrued in each such sampling. This 
requires us to choose the interleaved sampling probabilities very carefully so that the errors do not 
add up. 

At a high level, our algorithm has the same iterative structure as algorithms in [3] and |9|. In each 
iteration, the algorithm identifies suitable sampling probabilities of a subset of edges and removes them from 
the graph. It is in what the algorithm does with the remaining edges that our algorithm differs from previous 
work. While all remaining edges are retained for the next iteration in previous algorithms, we sample all the 
edges with probability 1/2 and retain only half of them in expectation for the next iteration. The intuition 
behind this sampling comes from the observation that the sampling probabilities decrease (approximately 
by a factor of 2) with every iteration; therefore, a natural approach is to sample the remaining edges with 
probability 1/2 and retain only the selected edges thereby reducing the time complexity of the next iteration. 

Suppose Xj be the set of remaining edges at the beginning of iteration /, Fj be the set of edges whose 
sampling probabilities are determined in iteration / and F, = X, \ F, be the set of remaining edges after 
iteration /. (Note that X,+i is therefore constructed by sampling each edge in Yj with probability 1/2.) Our 
proof technique consists of two parts. First, we show that the graph S containing appropriately weighted 
edges in U,/^- is in (1 it e/3)G whp, i.e. even though edges in Yj \X;+i are sampled out between iterations 
/ and / + 1 for each /, the retained edges (when weighted appropriately) are sufficient to preserve all cuts. 
In the second step of the proof, we show that the skeleton graph Gg constructed by sampling edges in U,F, 
and giving them appropriate weights is in (1 it e/3)S whp. For this proof, we use the generic sparsification 
framework developed recently by Hariharan and Panigrahi [9|. Combining these two steps, we conclude 
that Gg G (1±£)G whp. 

Roadmap. In section |2l we give an outline of the generic sampling framework from ^ that we use later 
in our proof. In section [3l we describe our sparsification algorithm, prove its correctness and derive its time 
complexity. Finally, we conclude with some open questions in section |4] 

2 Preliminaries 

We first need to introduce the notion of ^-heavy edges, for any ^ > 0. 

Definition 1. An edge e = {u,v) of an undirected graph G = {V,E) is said to be /c-heavy if the maximum 
flow between vertices u and v in G is at least k. 

By Menger's theorem (see e.g., ||4l), it follows that if e = (m, v) is A;-heavy, then the weight of every cut 
in G having u and v on different sides is at least k. 

2.1 Outline of Sparsification Framework from ||9l 

Suppose G = {y,E) h an undirected graph where edge e ^ E has a positive integer weight We- Let Gm = 
{y,EM) denote the multi-graph constructed by replacing each edge e by We unweighted parallel edges 
ei ,^2, • • • j^Wf ■ Consider any £ G (0, 1]. Suppose we construct a skeleton Gg where each edge e^ G Em is 
present in graph Gg independently with probability pe, and if present, it is given a weight of \/pe- Let 
Pe = min( ^^g '"" , 1), where a is independent of e and Ag is some parameter of e satisfying A^. < 2" — 1. The 
authors describe a sufficient condition that characterizes a good choice of a and A^'s. 

To describe this sufficient condition, partition the edges in Gm according to the value of Xe into sets 
/?o,/?i, • • • ,Rk where K = [Igmax^esjAJJ <n-\ and et G Rj iff V < Xe < 2^'+^ - 1. Now, let Q = 



{Qo,Qi,Q2,--,Qi = iV,Wi),...,Qk) be a sequence of subgraphs of Gm (edges of Gm are allowed to be 
replicated multiple times in the 2;s) such that 7?,- C Wj for every /. Q is said to be a {k, a) -certificate 
corresponding to the above choice of a and Ag's if the following properties are satisfied: 

TT-connectivity For / > 0, any edge eg € Ri is tt -heavy in Qi. 

(c) 
a-overlap For any cut C containing c edges in Gm, let w] be the number of edges that cross C in Qi. 

Then, for all cuts C, i:to ^^^^t" ^ «<^- 

Then, the following theorem holds. 

Theorem 4 (Hariharan-Panigrahi f9l (Theorem 8)). If there exists a {n, a) -certificate for a particular 
choice of a and Xg's , then the skeleton Gg G (lib£)G with probability at least 1 —A/n. Further Gg 
has 0{ " "i" Y.eeE T') edges in expectation. 



We also need the following lemma, which is a slight variation of Lemma 5 from ||9l. (For completeness, 
we give a proof in the appendix.) For an undirected unweighted graph G = {V,E), let /? C £" and Q^R 
be subsets of edges such that R is ;r-heavy in (y,Q). Suppose each edge e G /? is sampled with probability 
p, and if selected, given a weight of \/p to form a set of weighted edges R. Now, for any cut C in G, let 
RyC) ^ q{c) ^j^j ^(c) \jq {j^g ggfg Qf edges crossing cut C in R, Q and R respectively; also let the total weight 
of edges in /?(*-), Q^^^ and /?(*^) be A^\ q^^^ and r(*^) respectively. Then the following lemma holds. 

Lemma 1. For any 5 G (0, 1] satisfying 5^ ■ p-n > ^^, 

l^(c)_;^l<5^(c) 

for all cuts C in G with probability at least 1 — 4/n . 

2.2 Nagamochi-Ibaraki Forests 

We first introduce the notion of spanning forests of a graph. As earlier, G denotes a graph with integer edge 
weights We for edge e and Gm is the unweighted multi-graph where e is replaced with Wg parallel unweighted 
edges. 

Definition 2. A spanning forest T of Gm (or equivalently of G) is an (unweighted) acyclic subgraph of G 
satisfying the property that any two vertices are connected in T if and only if they are connected in G. 

We partition the set of edges in Gm into a set of forests Ti,T2,... using the following rule: 7] is a spanning 
forest of the graph formed by removing all edges inTi,T2,.. . , Tj^ifrom Gm such that for any edge e £ G, all 
its copies in Gm appear in a set of contiguous forests Ti^,Ti^+i,. . . ,7]'^+Wp-i- This partitioning technique was 
introduced by Nagamochi and Ibaraki in 1 18], and these forests are known as Nagamochi-Ibaraki forests (or 
NI forests). The following is a basic property of NI forests. 

Lemma 2 (Nagamochi-Ibaraki II18I 1171 ). For any pair of vertices u,v, they are connected in NI forests 
Ti,T2,.. . , Tf,(^u^v) for some k{u,v) and not connected in any forest Tj, for j > k(u,v). 

Nagamochi and Ibaraki also gave an algorithm for constructing NI forests that runs in 0(m + n) time if Gm 
is a simple graph (i.e. G is unweighted) and 0{m -\-n\ogn) time otherwise lITSlfTTll . 



3 The Algorithm 

We describe out sparsification algorithm for an unweighted graph G = {V,E) with m edges and n vertices, 
and an error parameter £ G (0, 1] as inputs. We prove that the skeleton graph Gg produced by the algorithm 
is in (1 lb e)G with high probability. We then show that the expected number of edges in Gg is 0{n\ogn/£^). 
Finally, we prove that the expected time complexity of the algorithm is 0{m). 

Description of the Algorithm. The algorithm has three phases. The first phase has the following steps: 

• lfm<2pn, where p = ^"i'^'"/ , G is sparse enough itself. Therefore, we take G as our skeleton graph. 

• Otherwise, we construct a set of NI forests of G and all edges in the first 2p NI forests are included in 
the skeleton graph Gg with weight 1. We call these edges Fq. The edge set Fq is then defined as E\Fo. 

The second phase is iterative. The input to iteration / is a graph (V,F,_i), which is a subgraph of the input 
graph to iteration / — 1 (i.e. F,_i C F,_2)- Iteration / comprises the following steps: 

• If the number of edges in F,_i is at most 2pn, we take all those edges in Gg with weight 2'"^ each, 
and terminate the algorithm. 

• Otherwise, all edges in F, are sampled with probability 1/2; call the sample Xi and let G/ = (V,X,). 

• We identify a set of edges in X,- (call this set Fi) that has the following properties: 

- The number of edges in i^- is at most 2ki \Vc\, where fc,- = p • 2'+ \ and Vc is the set of components 
m{V,Yi),whereYi=Xi\Fi. 

- Each edge in F,- is ^,-heavy in G,-. 

• We give a sampling probability pi = min( ^^^^2.-9 ; 1) to all edges in Fj. 

The final phase consists of replacing each edge in Fj (for each /) with 2' parallel edges, and then sampling 
each parallel edge independently with probability /?,. If an edge is selected in the sample, it is added to the 
skeleton graph Gg with weight l/pi. 

We now give a short description of the sub-routine that constructs the set Fj in iteration / of the second 
phase of the algorithm. This sub-routine is iterative itself: we start with Vc = V and Ec = Xj, and let 
Gc = {Vc,Ec). We repeatedly construct kj + 1 NI forests for Gc where kj = p2'+' + 1 and contract all edges 
in the (^; + 1 ) st forest to obtain a new Gc , until \Ec\ < p2kj\Vc\. The set of edges Ec that finally achieves this 
property forms Fj. 

The complete algorithm is given in Figure [T] 

Cut Preservation. We first show that the skeleton graph Gg produced by the above algorithm is in ( 1 it e)G 
with high probability. We use the following notation throughout: for any set of unweighted edges Z, cZ 
denotes these edges with a weight of c given to each edge. 
Our goal is to prove the following theorem. 

Theorem 5. Gg G (1±£)G with probability at least I — S/n. 

As outlined in the introduction, our proof has two stages. Let K be the maximum value of / for which F, 7^ 0; 
let S = (u^q2'F,) U2'^Yk and G5 = {V,S). Then, in the first stage, we prove the following theorem. 



1 Cpf ^ _ 10141nn 

2. If m < 2pn, then Gg = G; else, go to step [3] 

3. Construct NI forests Ti , 72, . . . for G. 

4. Set/ = 0. 

5. SetX,- = E;Fi = Ui<j<2pTj; Yi=Xi\Fi. 

6. Add each edge in Fi to Ge with weight 1. 

7. If 1^,1 < 2pn, then add each edge in F,- to Gg with weight 2'^^ and terminate; else, go to step|8] 

8. Sample each edge in F,- with probability 1/2 to construct X,_|_i. 

9. Increment /by 1. 

10. Set Ec=Xi;Vc = V. 

11. Setyc; = p-2'+i. 

12. If l^^l <2)t;|V'c|,then 

(a) Set Fi = E,;Yi=Xi\E,. 

(b) For each edge e € Fj, set A^. = p • 4'. 

(c) Go to step |7] 

Else, 

(a) Construct NI forests Ti,T2,..., T^j+i for graph Gc = {Vc,Ec). 

(b) Update Gc = {Vc,Ec) by contracting all edges in Tk^+i. 

(c) Go to step [H] 

13. For each edge e G U,F,-, 

(a) Setp, = min(§gf,l)=min(^g^,l). 

(b) Generate r^ from Binomial(2', /?(.). 

(c) If Tg > 0, add edge e to Gg with weight r^/pe- 

Figure 1 : Our sparsification algorithm 



Theorem 6. G5 G (1 ±e/3)G with probability at least 1 —4/n. 

In the second stage, we prove the following theorem. 

Theorem 7. Gg G (1 ± e/3)Gs with probability at least 1 — 4/n. 

Combining the above two theorems and using the union bound, we obtain Theorem |5l (Observe that since 
£<!,(! + e/3)2 < 1 + e and (1 - e/3f > 1 - e). 

The following property is key to proving both Theorem |6] and Theorem |7] 

Lemma 3. For any i > 0, any edge e £¥[ is kj-heavy in G, = {V,Xi), where kj = p ■ 2'^^. 

Proof. For / = 0, all edges in F, are in NI forests T2p+i,T2p+2,--- of G, = G. The proof follows from 
Lemma [21 

We now prove the lemma for / > 1. Let G^ = {Ve,Ee) be the component of G,- containing e. We will 
show that e is ^;-heavy in G^.; since G^ is a subgraph of G,, the lemma follows. In the execution of the else 
block of step[T2]on Gg, there are multiple contraction operations, each of them comprising the contraction 
of a set of edges. We show that any such contracted edge is ^,-heavy in Gg', it follows that e is ^,-heavy in 

Ge. 

Let Ge have t contraction phases and let the graph produced after contraction phase r be G^ ,-. We now 
prove that all edges contracted in phase r must be ^; -heavy in Ge by induction on r. For r = 1, since e appears 
in the {ki + l)st NI forest of phase 1, e is kj-heavy in Ge by Lemma[2l For the inductive step, assume that 
the property holds for phases 1 , 2, . . . , r. Any edge that is contracted in phase r + 1 appears in the (/c, + 1 )st 
NI forest of phase r+ 1; therefore, e is ^, -connected in Gg,,. by Lemma [2] By the inductive hypothesis, all 
edges of Ge contracted in previous phases are fc,-heavy in G^; therefore, an edge that is ^,-heavy in Ge^r must 
have been /:, -heavy in Gg. D 



Proof of Theorem |6l The next lemma follows from Lemma [T] 

!+l ' Ji -^1 I — 2 



Lemma 4. With probability at least 1 — 4/n^, for every cut C in Gu \ 2x|^ j + _/]• —x^ | < ^^Y 'A ■ 



Proof. Use the following parameters in Lemma[T} 
. R = Yi;Q = Xi;R = 2Xi+, 

• 5 = ^;p=\/2;% = p-T+\ 

Lemma [3] ensures that R is tt -heavy in (V, Q); also, it can be verified that 8^ ■ p-K = 61nn. D 

We use the above lemma to prove the following lemma. 

Lemma 5. Let Sj = (uf^jT^Jpi) U2'^- JYk for any j > 0. Then, Sj G (1 ± {e /3)2- J^^)Gj with probability 
at least 1 —4/n, where Gj = (y,Xj). 

To prove this lemma, we need to use the following fact. 

Fact 1. Let x G (0, 1] and r,- = 13 • 2'/^. Then, for any k>0, 

k 

n(i+^M) < 1+^/3 

j=0 

k 

Y{{\-x/ri) > l-x/3. 

r=0 



Proof. We prove by induction on k. For ^ = 0, the property trivially holds. Suppose the property holds for 
k—\. Then, 

no-/'.) - no-B^) 

< {l+x/l^)-{\+x/{2,V2)) 

< l+;c/3 

X 



n(i-x/n-) = na- 13. 2,72. 



- U-/B).n(.-T5^) 

> (1-a;/13)-(1-x/(3\/2)) 

> l-x/3. 

D 
Proof of Lemma\5\ For any cut C in G, let the edges crossing C in 5^ be 5^ , and let their total weight be 

(C) (C) (C) (C) 

S: . Also, let X> , Fj- and i^- be the set of edges crossing cut C in X,, F, and F, respectively, and let 

(C) (C) (C) 

their total weights be x] , y] and /)• . (Recall that all edges in X,-, F) and Ft are unweighted; therefore 

(C) iv(C)| (C) n7(C)l J AC) irlOix 

Since K < n— I, we can use the union bound on Lemma |4] to conclude that with probability at least 
1 — 4/n, for every < / < ^ and for all cuts C, 

2xf;_\+fP < i\ + e/n)xf^ 



2x!3+#^ > il-e/r^Jf\ 



where r,- = 13 • 2'''^. Then, 



= 2'^-J/p +2'^-^fP+2^-^-if^^\ + . . . +/f 

= 2-^x?+2--Vf\ + ...+/f' smce,f+/i^)=xf 

= 2^-'-\2.f +ff\) + (2^-^-^/fi + • ■ • + /f ) 

< (1 + elrK-,)2^-'-ixf_, + (2^-^-^-#), + . . . +/f ) 

< (1 + £/r^_0(2^-'-^'4'^ii +2'^-^-^-#), + . . . +/f ') 

< (1 +£/rj^_i)(l +£/r^_2) . . . {l+e/rj)xf 

< (1 + {e2-^l^)/rK-,-j){l + {e2-il^)/rK-2-j) • • • (1 + (£2-^'/2)/^^,)^(c) .j^^^ ^.^. ^ ^. . 27/2 

< (l + (e/3)2--''/2)xJ^) byFact[Il 



Similarly, 



(c) 



2-^x? +2-'-^/fJ, + . . . +/)^) since ,f +/i^ 



(c) _ (c) 



"■K 



I'^-'-^l^^ +f^\) + {1^-^-^ftl^ + . . . + /f 



(C) 



p(C)^ 



> 
> 

> 
> 
> 



f(C)^ 



1 - £/r^_0(2^-'-^-4'^ii +2^-2-^-#), + . . . +/f ') 

l-£/rjf_i)(l-£/r^_2)...(l-£/0->f' 

1 - (e/3)2--''/2)xf ^ by Factffl 



since r 



;+! 



:.2^'/2 



D 



Theorem [6] now follows as a corollary of the above lemma for 7 = 0. 



Proof of Theorem |7l Now, we use the sparsification framework developed in Q and outlined previously 
in section|2]to prove Theorem|2l Observe that edges FoU2^F/(: are identical in Gj and Gg. Therefore, we do 
not consider these edges in the analysis below. 

For any / > 1, let i//^(/) be such that T^^") < p • 4' < TV'S)^^ _ 1. Note that for any 7, y^{i) = j for at most 
one value of /. Then, for any j > I, Rj = Fi if j = i//(j) and Rj = if there is no / such that j = Y{i)- We set 
a = 32/3; n = p-4^; for any j > 1, Qj = {V,Wj) where Wj = U;_i<,-</f4^-'"+^2''fr H Rj + and 7 = v/-(/), 
andlV,=0if/?y = 0. 

The following lemma proves Ti-connectivity. 

Lemma 6. With probability at least 1 —4/n, every edge e £ Fi= R^^nfor each i>\ is p- A^ -heavy in Q\j/iiy 

Proof. Consider any edge e G i^. Since F,- C y;_i,Lemma[3]ensures that e is p- 2' -heavy in G,-i = (V,X;-i), 
and therefore p • 2^'^^ -heavy in (V,2'"'Z,_i). Since £ < 1, Lemma [5] ensures that with probability at least 
1 — A/n, the weight of each cut in (V,2'~^X,_i) is preserved up to a factor of 2 in Z, = (V, U,_i<r</f2''/v)- 
Thus, e is p • 4'^^ -heavy in Z,-. 

Consider any cut C containing e G Fi. We need to show that the weight of this cut in Q^(j) is at least 4^. 
Let the maximum Xa of an edge a in C be p • A'^^ , for some k^ > i. By the above proof, a is p • 4^'^^ '-heavy 
in Z^^. Then, the total weight of edges crossing cut C in Q^iiu^.) is at least p • 4*^c-i . ^K-kc+i _ p . ^k ^jj^^^g 
kc > i, Wi^c) > Vi^) and Q}i/[kc) i^ ^ subgraph of Qxi/(i). Therefore, the the total weight of edges crossing cut 
C in g^(/) is at least p • 4^. D 

(c) (c) 

We now prove the a-overlap property. For any cut C, let /]• and w] respectively denote the total 

weight of edges crossing cut C in Ft and W^(,) respectively for any / > 0. Further, let the number of edges 
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crossing cut C in U^q2'/^- be f^^'^ . Then, 



Z^ -n- — i-i On.AK ~ 2-iT.AK-i ~ 2-iO.AK-i ~ 2-i 2-i 



K K AC) K r+\ AC) K AC) r+1 09 A" , ^ 09 

Lv ■^'" = y y ■''' = Y ■''' V 2^'+' = — V2'Y^*^^ = — f(c) 
Z^ 9r-2!-l Z^ Z^ 9r-2i-l L-i '^r L-i ri I-i ■''' "K •' ' 

Using Theorem m we conclude the proof of Theorem |7j 

Size of the skeleton graph. We now prove that the expected number of edges in Ge is 0{n\ogn/e^). 
For / > 1, define D, to be the set of connected components in the graph G, = (V,X,); let Dq be the single 
connected component in G. For any / > 1, if any connected component in D, remains intact in D;+i, then 
there is no edge from that connected component in Fj. On the other hand, if a component in D, splits into t] 
components in D,+i, then the algorithm explicitly ensures that Y^eeFi y' fro^n that connected component is 

lleeF. ^ < {^^) T? = 4t7 < 8(Tj - 1). Therefore, if di = |A|, then 

since we can have at most n singleton components. It follows from Theorem |4] that the expected number of 
edges added to Ge by the sampling is 0(«logM/£^). Since the number of edges added to Ge in steps [6] and |7] 
of the algorithm is 0{n\ogn/e^), the total number of edges in Ge is 0{nlogn/e'^). 

Time complexity of the algorithm. If m < 2pn, the algorithm terminates after the first step which takes 
0{m) time. Otherwise, we prove that the expected running time of the algorithm is 0{m + n\ogn/e^) = 
0{m) since p = 0(log?i/£^). First, observe that phase 1 takes 0{m + n\ogn) time. We will show that 
iteration / of phase 2 takes 0(|F,_i|) time. Since Yj cX,- and ]E[|X,|] = E[|X,_i|]/2, and |Fo| < 'w, it follows 
that the expected overall time complexity of phase 2 is 0{m). Finally, the time complexity of phase 3 is 
0{m + n\ogn/E^) (see e.g. |10]). 

In iteration / of phase 2, the first step takes |F;-i| time. We show that all the remaining steps take 
0(|X,| +?ilog?i) time. Since X,- C F,_i and the steps are executed only if F,_i = Q.{n\ogn / e^) , it follows that 
the total time complexity of iteration / of phase 2 is 0(|}^_i|). 

First, observe that step[8]and the if block of step [T2] take 0(|^/|) time. So, we are left with the repeated 
invocations of the else block of step [12] Each iteration of the else block takes 0(| Vd logn + |£'c|) time for 
the current Vc,Ec. So, the last invocation of the else block takes at most 0(|X,| +?ilog?i) time. In any other 
invocation, \Ec\ = n(|Vc| logn) and hence the time spent is 0{\Ec\). We show that \Ec\ decreases by a factor 
of 2 from one invocation of the else block to the next; then the total time over all invocations of the else 
block is 0(|X,| +?ilog?i). 

To see that the l^^ | halves from one invocation of the else block to the next, consider an iteration that 
begins with \Ec\ > 2/c,- • |Vc|. By Lemma[2l Ec for the next iteration (denoted by E[) comprises only edges in 
the first kt NI forests constructed in the current iteration. So \E'^\ < kt ■\Vc\ < \Ec\/2. 
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4 Future Work 

The obvious open question is whether these results can be extended to weighted graphs, at least if the 
weights are polynomially bounded in n. Another possibility is to extend these results to the semi-streaming 
model for unweighted graphs. A more ambitious open problem is to obtain an efficient (i.e. near-linear in m) 
algorithm that constructs a skeleton containing o{n\ogn) edges while approximately preserving the weights 
of all cuts with high probability. 
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A Proof of Lemma H] 



To prove Lemma 
Chemoff bounds. 



B we will need two theorems from 191 . The first theorem is a non-uniform extension of 



Theorem 8 (Hariharan-Panigrahi||9])- Consider any subset C of unweighted edges, where each edge e € C 
is sampled independently with probability Pefar some Pe € [0, 1] and given weight \/pe if selected in the 
sample. Let the random variable Xg denote the weight of edge e in the sample; if e is not selected in the 
sample, then Xg = 0. Then, for any p such that p < Pefor all edges e, any e G (0, 1], and any N > \C\, the 
following bound holdsl^ 



\Y,Xe-\C\\ >£N 



< 2e-0-3**^'p^. 



To state the second theorem, we need the following definitions. 

Definition 3. For any undirected graph G and for any k > 0, the k-projection of any cut C is the set of 
k-heavy edges in C. 

Definition 4. The edge connectivity of an undirected graph G is the minimum weight of a cut in G. 

The theorem counts the number of distinct /^-projections in cuts of weight ak for any k>c, where c is the 
edge connectivity of the graph. 



^For Chernoff bounds, see e.g. 1 16|. 

■'For any event <f , ¥[<S'] represents the probability of event S". 
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Theorem 9 (Hariharan-Pamgrahi|l9]). For any undirected graph with edge connectivity c and for any k>c 
and any a > 1, the number of distinct k-projections of cuts of weight at most ak is at most n^". 

Using the above two theorems, we now prove Lemma [T] 

Proof of LemmaUl Let 'tfj be the set of all cuts C such that 2^71 < r^*^) < 2^+^71 - 1, j > 0. We will prove 
that with probability at least 1 — 2n^^^ , all cuts in "^y satisfy the property of the lemma. Then, the lemma 
follows by using the union bound over j since 2n^^ + 2n^^ + . . . + In^^^ + . . . < 4n^^. 

We now prove the property of the lemma for cuts C e ^j. Since each edge e G R^'^^ is sampled with 

probability p in obtaining /?(*-), we can use Theorem [8] with sampling probability p. Then, for any R^'^> 
where C &'^j, by Theorem [8l we have 



.(c) _ Jc) 



> 5^(^)1 < 2e-°-3^-^'-P'?'^' < 2^-0-38-5'/''^-2' < 2^-6-2^ 1- = 2n-''-^\ 



since g*-*-) > 71-2^ for any C € ^j. Since each edge in /?'-*') is tt -heavy in {V,Q), Theorem |9] ensures that 
the number of distinct T?'^' sets for cuts C G ^j is at most n 



'k-2J+^' 



n . Using the union bound over 



these distinct /?('') edge sets, we conclude that with probability at least 1 — 2« ^^ ^ 
property of the lemma. 



all cuts in '^j satisfy the 
D 
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